Improving Web Query Classification by Latent Topic Analysis

نویسنده

  • Cheng-Zen Yang
چکیده

As Web search engines play an important role in helping people find required information from massive Web data, the Web query classification (WQC) problem becomes an important research issue. Contrast to traditional classification problems, WQC is to classify Web queries into relevant categories of a Web taxonomy. In addition to the typical challenge of processing short and ambiguous queries, WQC faces the practical difficulty of classifying queries into more than one category. In this paper, we propose a semantic-based scheme that uses the Latent Dirichlet Allocation (LDA) model to exploit the latent topic semantics of expanded queries and Web categories for WQC. To evaluate the effectiveness, we have conducted experiments in comparison with the algorithm proposed by Shen et al. in 2005. The results show that our approach has 6.5% and 6.6% improvements in the top-5 classification results for precision and F1, respectively. Keyword: Web Query Classification, Query Expansion, Latent Topic Analysis, Weighted Ensemble

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Query expansion based on contextual meaning of the query terms

In this paper, a new method for query expansion is proposed which expands the original query based on the contextual meaning of it. User provides the method with an original query and a set of relevant documents. The term space extracted from the relevant documents by the vector space model is transformed to a topic space by means of latent semantic analysis. The terms’ projections in the topic...

متن کامل

Analysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type

Background and Aim: The basic aim of the present study is to investigate users’ query reformulation behavior with regard to wholistic-analytic cognitive styles, search task type, and experience variables in using the Web. Method: This study is an applied research using survey method. A total of 321 search queries were submitted by 44 users. Data collection tools were Riding’s Cognitive Style A...

متن کامل

Improving Web Page Classification by Integrating Neighboring Pages via a Topic Model

This paper applies a topic model to represent the feature space for learning the Web page classification model. Latent Dirichlet Allocation (LDA) algorithm is applied to generate a probabilistic topic model consisting of term features clustered into a set of latent topics. Words assigned into the same topic are semantically related. In addition, we propose a method to integrate the additional t...

متن کامل

Micro-blog Personalized Query Expansion Based on Latent Topic Classification

With the increasing maturity of Web2.0 technology and development of micro-blog, the number of micro-blog pages is exponentially rising. Only relying on the traditional micro-blog search engine has not met the requirements of users. Aiming at that the retrieval efficiency of the traditional micro-blog searching method cannot meet the requirements of users, inspired by probabilistic latent seman...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011